在线学习和决策中的一个核心问题 - 从土匪到强化学习 - 是要了解哪种建模假设会导致样本有效的学习保证。我们考虑了一个普遍的对抗性决策框架,该框架涵盖了(结构化的)匪徒问题,这些问题与对抗性动力学有关。我们的主要结果是通过新的上限和下限显示决策估计系数,这是Foster等人引入的复杂度度量。在与我们环境的随机对应物中,对于对抗性决策而言是必要和足够的遗憾。但是,与随机设置相比,必须将决策估计系数应用于所考虑的模型类(或假设)的凸壳。这就确定了容纳对抗奖励或动态的价格受凸层化模型类的行为的约束,并恢复了许多现有结果 - 既积极又负面。在获得这些保证的途径中,我们提供了新的结构结果,将决策估计系数与其他众所周知的复杂性度量的变体联系起来,包括Russo和Van Roy的信息比以及Lattimore和Gy的探索目标\“ {o} rgy。
translated by 谷歌翻译
Epsilon-Greedy,SoftMax或Gaussian噪声等近视探索政策在某些强化学习任务中无法有效探索,但是在许多其他方面,它们的表现都很好。实际上,实际上,由于简单性,它们通常被选为最佳选择。但是,对于哪些任务执行此类政策成功?我们可以为他们的有利表现提供理论保证吗?尽管这些政策具有显着的实际重要性,但这些关键问题几乎没有得到研究。本文介绍了对此类政策的理论分析,并为通过近视探索提供了对增强学习的首次遗憾和样本复杂性。我们的结果适用于具有有限的Bellman Eluder维度的情节MDP中的基于价值功能的算法。我们提出了一种新的复杂度度量,称为近视探索差距,用Alpha表示,该差距捕获了MDP的结构属性,勘探策略和给定的值函数类别。我们表明,近视探索的样品复杂性与该数量的倒数1 / alpha^2二次地量表。我们通过具体的例子进一步证明,由于相应的动态和奖励结构,在近视探索成功的几项任务中,近视探索差距确实是有利的。
translated by 谷歌翻译
Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.
translated by 谷歌翻译
Text-to-text generation models have increasingly become the go-to solution for a wide variety of sequence labeling tasks (e.g., entity extraction and dialog slot filling). While most research has focused on the labeling accuracy, a key aspect -- of vital practical importance -- has slipped through the cracks: understanding model confidence. More specifically, we lack a principled understanding of how to reliably gauge the confidence of a model in its predictions for each labeled span. This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling. Most notably, we find that simply using the decoder's output probabilities is not the best in realizing well-calibrated confidence estimates. As verified over six public datasets of different tasks, we show that our proposed approach -- which leverages statistics from top-$k$ predictions by a beam search -- significantly reduces calibration errors of the predictions of a generative sequence labeling model.
translated by 谷歌翻译
We consider the task of text generation in language models with constraints specified in natural language. To this end, we first create a challenging benchmark Cognac that provides as input to the model a topic with example text, along with a constraint on text to be avoided. Unlike prior work, our benchmark contains knowledge-intensive constraints sourced from databases like Wordnet and Wikidata, which allows for straightforward evaluation while striking a balance between broad attribute-level and narrow lexical-level controls. We find that even state-of-the-art language models like GPT-3 fail often on this task, and propose a solution to leverage a language model's own internal knowledge to guide generation. Our method, called CognacGen, first queries the language model to generate guidance terms for a specified topic or constraint, and uses the guidance to modify the model's token generation probabilities. We propose three forms of guidance (binary verifier, top-k tokens, textual example), and employ prefix-tuning approaches to distill the guidance to tackle diverse natural language constraints. Through extensive empirical evaluations, we demonstrate that CognacGen can successfully generalize to unseen instructions and outperform competitive baselines in generating constraint conforming text.
translated by 谷歌翻译
Language models have been shown to perform better with an increase in scale on a wide variety of tasks via the in-context learning paradigm. In this paper, we investigate the hypothesis that the ability of a large language model to in-context learn-perform a task is not uniformly spread across all of its underlying components. Using a 66 billion parameter language model (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case: $\sim$70% of attention heads and $\sim$20% of feed forward networks can be removed with minimal decline in task performance. We find substantial overlap in the set of attention heads (un)important for in-context learning across tasks and number of in-context examples. We also address our hypothesis through a task-agnostic lens, finding that a small set of attention heads in OPT-66B score highly on their ability to perform primitive induction operations associated with in-context learning, namely, prefix matching and copying. These induction heads overlap with task-specific important heads, suggesting that induction heads are among the heads capable of more sophisticated behaviors associated with in-context learning. Overall, our study provides several insights that indicate large language models may be under-trained to perform in-context learning and opens up questions on how to pre-train language models to more effectively perform in-context learning.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast architecture for edge devices that adds hierarchically organized sparse memory after each Transformer layer. SPARTAN freezes the PLM parameters and fine-tunes only its memory, thus significantly reducing storage costs by re-using the PLM backbone for different tasks. SPARTAN contains two levels of memory, with only a sparse subset of parents being chosen in the first level for each input, and children cells corresponding to those parents being used to compute an output representation. This sparsity combined with other architecture optimizations improves SPARTAN's throughput by over 90% during inference on a Raspberry Pi 4 when compared to PE baselines (adapters) while also outperforming the latter by 0.1 points on the GLUE benchmark. Further, it can be trained 34% faster in a few-shot setting, while performing within 0.9 points of adapters. Qualitative analysis shows that different parent cells in SPARTAN specialize in different topics, thus dividing responsibility efficiently.
translated by 谷歌翻译
Many existing datasets for lidar place recognition are solely representative of structured urban environments, and have recently been saturated in performance by deep learning based approaches. Natural and unstructured environments present many additional challenges for the tasks of long-term localisation but these environments are not represented in currently available datasets. To address this we introduce Wild-Places, a challenging large-scale dataset for lidar place recognition in unstructured, natural environments. Wild-Places contains eight lidar sequences collected with a handheld sensor payload over the course of fourteen months, containing a total of 67K undistorted lidar submaps along with accurate 6DoF ground truth. Our dataset contains multiple revisits both within and between sequences, allowing for both intra-sequence (i.e. loop closure detection) and inter-sequence (i.e. re-localisation) place recognition. We also benchmark several state-of-the-art approaches to demonstrate the challenges that this dataset introduces, particularly the case of long-term place recognition due to natural environments changing over time. Our dataset and code will be available at https://csiro-robotics.github.io/Wild-Places.
translated by 谷歌翻译
Deep learning based text-to-speech (TTS) systems have been evolving rapidly with advances in model architectures, training methodologies, and generalization across speakers and languages. However, these advances have not been thoroughly investigated for Indian language speech synthesis. Such investigation is computationally expensive given the number and diversity of Indian languages, relatively lower resource availability, and the diverse set of advances in neural TTS that remain untested. In this paper, we evaluate the choice of acoustic models, vocoders, supplementary loss functions, training schedules, and speaker and language diversity for Dravidian and Indo-Aryan languages. Based on this, we identify monolingual models with FastPitch and HiFi-GAN V1, trained jointly on male and female speakers to perform the best. With this setup, we train and evaluate TTS models for 13 languages and find our models to significantly improve upon existing models in all languages as measured by mean opinion scores. We open-source all models on the Bhashini platform.
translated by 谷歌翻译